Skip to content

feat(databases): add --url flag to tables load for remote parquet files#88

Merged
eddietejeda merged 1 commit into
mainfrom
feat/databases-tables-load-url
May 19, 2026
Merged

feat(databases): add --url flag to tables load for remote parquet files#88
eddietejeda merged 1 commit into
mainfrom
feat/databases-tables-load-url

Conversation

@eddietejeda
Copy link
Copy Markdown
Contributor

@eddietejeda eddietejeda commented May 19, 2026

Summary

  • Adds --url <url> flag to hotdata databases tables load, mutually exclusive with --file and --upload-id
  • Validates the URL ends in .parquet before fetching
  • Streams the remote file directly to POST /files using reqwest blocking (already a project dependency) — no temp file on disk
  • Progress bar with bytes + eta when server sends Content-Length; spinner fallback when not
  • Extracts shared finish_upload helper to deduplicate upload/error-handling logic between --file and --url paths

Test plan

  • Tested against NYC TLC yellow taxi data (3.7M rows): hotdata databases tables load taxi yellow_tripdata --url https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2026-01.parquet
  • Non-parquet URL is rejected with a clear error before any network request
  • URL returning non-2xx is reported as an error
  • --url + --file together are rejected by clap
  • --url + --upload-id together are rejected by clap
  • All 152 existing tests pass (cargo test)

🤖 Generated with Claude Code

Allows loading a remote parquet file directly by URL without a local
download step. The CLI fetches the URL via reqwest, streams it to
POST /files, and shows a progress bar (bytes+eta if Content-Length is
present, spinner otherwise).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sentry
Copy link
Copy Markdown

sentry Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 0% with 55 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/databases.rs 0.00% 53 Missing ⚠️
src/main.rs 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread src/databases.rs
use crossterm::style::Stylize;
eprintln!("{}", crate::util::api_error(resp_body).red());
fn upload_parquet_url(api: &ApiClient, url: &str) -> String {
if !is_parquet_path(url) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is_parquet_path checks ends_with(".parquet") against the raw URL, so it rejects URLs with query strings or fragments — e.g. S3/GCS presigned URLs like https://bucket.s3.amazonaws.com/file.parquet?X-Amz-Signature=... would be turned away even though they point at a valid parquet file. Consider parsing the URL and validating only the path component (or stripping ?/# before the extension check) so signed URLs work. (not blocking)

Comment thread src/databases.rs

let body: serde_json::Value = match serde_json::from_str(&resp_body) {
Ok(v) => v,
let resp = match reqwest::blocking::get(url) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: reqwest::blocking::get uses a default client with no timeout, so a remote that accepts the TCP connection but never responds will hang the CLI indefinitely (only Ctrl-C will get out). Consider building a reqwest::blocking::Client with connect_timeout / timeout and using it here. (not blocking)

@eddietejeda eddietejeda merged commit 01751f4 into main May 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant